Domain adaptation algorithms are widely used for cross-corpus speech emotion recognition. However, many domain adaptation algorithms lose the discrimination of target domain samples while pursuing the minimization of domain discrepancy, resulting in their presence at the decision boundary of the model in a high-density form, which degrades the performance of the model. Based on the above problem, a Decision Boundary Optimized Domain Adaptation (DBODA) method based cross-corpus speech emotion recognition was proposed. Firstly, the features were processed by using convolutional neural networks. Then, the features were fed into the Maximum Nuclear-norm and Mean Discrepancy (MNMD) module to maximize the nuclear norm of the sentiment prediction probability matrix of the target domain while reducing the inter-domain discrepancy, thereby enhancing the discrimination of the target domain samples and optimize the decision boundary. In six sets of cross-corpus experiments set up on the basis of Berlin, eNTERFACE and CASIA speech databases, the average recognition accuracy of the proposed method is 1.68 to 11.01 percentage points ahead of those of the other algorithms, indicating that the proposed model effectively reduces the sample density around the decision boundary and improves the prediction accuracy.
Aiming at the problem of artificial artifacts due to phase disorder in frequency-domain speech enhancement algorithms, which limits the denoising performance and decreases the speech quality, a speech enhancement algorithm based on Multi-Scale Ladder-type Time-Frequency Conformer Generative Adversarial Network (MSLTF-CMGAN) was proposed. Taking the real part, imaginary part and magnitude spectrum of the speech spectrogram as input, the generator first learned the local and global feature dependencies between temporal and frequency domains by using time-frequency Conformer at multiple scales. Secondly, the Mask Decoder branch was used to learn the amplitude mask, and the Complex Decoder branch was directly used to learn the clean spectrogram, and the outputs of the two decoder branches were fused to obtain the reconstructed speech. Finally, the metric discriminator was used to judge the scores of speech evaluation metrics, and high-quality speech was generated by the generator through minimax training. Comparison experiments with various types of speech enhancement models were conducted on the public dataset VoiceBank+Demand by subjective evaluation Mean Opinion Score (MOS) and objective evaluation metrics.Experimental results show that compared with current state-of-the-art speech enhancement method CMGAN (Comformer-based MetricGAN), MSLTF-CMGAN improves MOS prediction of the signal distortion (CSIG) and MOS predictor of intrusiveness of background noise (CBAK) by 0.04 and 0.07 respectively, even though its Perceptual Evaluation of Speech Quality (PESQ) and MOS prediction of the overall effect (COVL) are slightly lower than that of CMGAN, it still outperforms other comparison models in several subjective and objective speech evaluation metrics.
Rate-Distortion (R-D) optimization is a crucial technique in video encoders. However, the widely used independent R-D optimization is far from being global optimal. In order to further improve the compression performance of High Efficiency Video Coding (HEVC), a two-pass encoding algorithm combined with both R-D dependency and R-D characteristic was proposed. Firstly, the current frame was encoded with the original method in HEVC, and the number of bits consumed by the current frame and the R-D model parameters of each Coding Tree Unit (CTU) were obtained. Then, combined with time domain dependent rate distortion optimization, the optimal Lagrange multiplier and quantization parameter for each CTU were determined according to the information including current frame bit budget and R-D model parameters. Finally, the current frame was re-encoded, where each CTU had different optimization goal according to its Lagrange multiplier. Experimental results show that the proposed algorithm achieves significant rate-distortion performance improvement. Specifically, the proposed algorithm saves 3.5% and 3.8% bitrate at the same coding quality, compared with the original HEVC encoder, under the coding configurations of low-delay B and P frames.
To solve the problems of too long time of pathfinding and collision and blocking during movement in real-time strategy games, a combined improved flow field pathfinding algorithm was proposed. Firstly, the red-black tree was used to store data to improve the speed of data access. Secondly, by using the penalty function, the calculation of the integration field cost was simplified through transforming the nonlinear partial differential equation problem into a linear unconstrained problem. Finally, a pre-adjacency node was introduced to generate the flow direction. Compared with the flow field pathfinding algorithm without improvement, the improved algorithm has the path calculation time reduced by 20%, and the average moving time is stable at 20 s. Experimental results show that the improved flow field pathfinding algorithm can effectively shorten the pathfinding time, increase the moving speed of Agents and improve the level of game artificial intelligence in real-time strategy games.